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(57) Abstract 

A bridge for multi-processor system includes bus interfaces for connection to an I/O bus of a first processing set, an I/O bus of 
a second processing set and a device bus. A bridge control mechanism is operable to permit direct memory access to memory of the 
processing sets by a device on the device bus, to arbitrate between the first and the second processing sets for access to the bridge in a 
first split mode and to monitor lockstep operation of the first and second processing sets in a second, combined, mode, rhe dirty RAM 
mechanism defines a dirty indicator (e.g., a bit) for each of a plurality of regions of processing set memory, a dirty indicator being set to a 
predetermined value when the region of memory has been written to by a DMA access. One of the processing sets can be operable in the 
split mode as a primary processing set to copy the content of its memory to the other processing set(s) and to recopy regions which become 
identified by the dirty RAM mechanism as having been written to by virtue of the corresponding dirty indication being set. In response to a 
synchronization reset operation from the primary processing set, on completion of copying the content of the memory regions identified in 
the dirty RAM mechanism with no further regions having being so identified, the bridge can transfer from the split mode to the combined 
mode. 
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WO 99/66402 PCT/US99/12429 
TITLE: TRACKING MEMORY PAGE MODIFICATION IN A BRIDGE FOR A MULTI-PROCESSOR 
SYSTEM 

BACKGROUND OF THE INVENTION 

TTus invention relates to a multi-processor computer system including first and second processing sets 
(each of which may comprise one or more processors, which communicate with an I/O devtce bus. 

The application finds particular application to fault tolerant computer systems where two or more 
processor sets need to communicate wnh an I/O devtce bus in lockstep w,th provision for identifying lockstep 
errors in order to identify faulty operation of the system as a whole. 

in such a fault tolerant computer system, an aim is not only to be able to identify faults, but also to prcvtde 
a structure which is able to provide a high degree of system availability. In order to provtde high levels of system 
availability, it would be desirable for such systems to automatically attempt recovery from a lockstep error. 

As pan of such an automatic recovery process it is necessary to reintegrate the state of the processtng sets 
t0 a common status in order to attempt a restart in lockstep. An approach to achieving thts « to copy the complete 
state of one of the processtng sets (i.e. the "good" one) to the other processtng set. Tins invo.ves ensuring that the 
content of the memory of both processors is the same before trying a restart in lockstep mode. 

However, a problem with the copying of the content of the memory from one processing set to the other ts 
that dunng thts time devices connected to the I/O bus may be making direct memory access (DMA) to the memory 
of the processing set(s). If a write is made to an area of memory which has already been copied, this would result 
in the memory state in the processing sets at the end of the copy not being the same. 

It has been proposed to employ a dirty RAM in a processor to indicate areas of memory which have been 
changed since the dirty RAM was last reset. A dirty RAM is a bit map having a bit for each block, or page, of 
me mory which bit is set when a write access to the area of memory concerned is made. However, the provision of 
a dirty RAM in the processing sets would not provide a reUable solution to the problem of reinstating the memory 
of the processor because of the difficulties and delays in accessing the dirty RAM of other processing sets. 

An aim of the present invention is to provide a solution to the problem of addressing direct memory 
accesses in achieving reinstatement of a concurrent state in first and second processtng sets. 

st IMMARY OF THE INVENTION 

Particular and preferred aspects of the invention are set out in the accompanying independent and 
dependent claims. Combinations of features from the dependent claims may be combined with features of the 
independent claims as appropriate and not merely as explicitly set out in the claims. 

In accordance with one aspect of the invention, there .s provided a bridge for a multi-processor system. 
The bridae comprises bus interface for connection to an I/O bus of a first processing set. an I/O bus of a second 
processus set and a device bus. A bridge control mechanism is operable to permit direct memory access to 
memory of the processing sets by a device on the device bus. to arbitrate between the first and the second 
processing sets for access to the bridge in a first, spHt. mode, and to monitor lockstep operation of the first and 
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second processing sets in a second, combined, mode. A dirty RAM mechanism is proved in the bridge for 
monnonng regions of processor set memory modified by direct memory accesses by the device on the device 

bus. 

An embodiment of the invention is thus able to monitor pans of memory modified by DMA operations 
initiated by a device on the device bus. By providing a dirty RAM mechanism in a badge, this facilitates access 
to the dtny RAM by the processing sets. The reintegration process can involve a number of passes, dunng each 
of which passes dirtied memory ,s copied from a good processing set to a faulty (target) processing set or sets 
Dunng the process of re-integration the good processing set can access the dirty RAM to determine the pans of 
the memory which have been dirtied (in either its own or the target processing set's memory) to be cop.ed on 

any pass. . , 

It should be noted that the bus interfaces referenced above need not be separate components of the 
bridge, bu, may be incorporated in other components of the bndge, and may indeed be simply connections for 

the lines of the buses concerned. 

,„ an embodiment of the invention, the dirty RAM mechanism defines a dirty indicator (e.g., a bit) for 

each of a plurality of regions of processtng se, memory, a dirty indicator being set to a predetermined value 

when the region of memory has been written to by a DMA access. 

The processing sets can be configured such mat one of the processing sets is operable in the split mode 

as a pnmary processtng set and to copy the content of its memory to the other processing set(s). If dunng th.s 
copy operation some of the regions of the memory are written to by a direct memory access, the state at the end 
of the copy operation will not be the same in the various processing sets. As a result the primary processing set 
re-cop.es those regions of its memory which have been marked in the dirty RAM mechanism as having been 
written to by virtue of the corresponding dirty indication being set. This process can be repeated in a number of 

passes as required. * 

In an embodiment of the invention, the bridge control mechanism compnses an arbiter connected to 
the first and second processor bus interfaces and to the device bus interface, the arbiter being configured to be 
operable in the split mode to arbitrate for use of the bndge by the first and second processtng sets and devices 
on the dev.ce bus. The bridge control mechanism is configured to be operable to respond to a synchronization 
reset operation from the primary processing set, on completion of copying the content of the memory regions 
identified in the dirry RAM mechanism with no further regions having being so identified, to transfer from the 
split mode of operation to the combined mode of operation. 

The dirry RAM mechanism can comprise a dirty RAM configured in random access memory m the 
bndge Alternatively, a separate hardware memory device may be provided. The content of the dirty RAM can 
be cleared on being read by a processing set. Alternatively, two dirty RAMs can be provided, the two dirty 
RAMs being operable in a toggle mode with one being written to while the other is being read. Opnonally. a 
respective dirty RAM could be provided for each processing set. 

There may be more than two processor bus interfaces for connection to I/O buses of respective 

processing sets. . . 

In accordance with another aspect of the invention, there is provided a computer system compnsmg 
first processing set having an I/O bus. a second processing set having an I/O bus, a device bus, at least one 
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device on ihe device bus and a bridge as set out above. Each processing set may comprise at least one 
processor, memory and a processing set I/O bus controller. 

In accordance with a further aspect of the invention, there is provided a method of operating a multi- 
processor system as set out above, the method comprising: 

permitting direct memory access to memory of the processing sets by the at least one device on the 

device bus; and 

monitoring, in a dirty RAM in the bndge, regions of processor set memory wnrten to by the device on 
the device bus. 

A method of re-integration can involve multiple passes of copying areas of memory from a first 
processing set to a second processing set, the areas to be copied being identified by the areas memory for which 

corresponding dirty RAM bit is set. 

The re-integration method can include a set of preventing direct memory access to restart in a combined. 

or lockstep. mode. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Exemplary embodiments of the present invention will be described hereinafter, by way of example only, 
with reference to the accompanying drawings in which like reference signs relate to like elements and in which: 

Figure 1 is a schematic overview of a fault tolerant computer system incorporating an embodiment of the 

invention; 

Figure 2 is a schematic overview of a specific implementation of a system based on that of Figure 1 ; 
Figure 3 is a schematic representation of one implementation of a processing set; 
Figure 4 is a schematic representation of another example of a processing set; 
Figure 5 is a schematic representation of a further processing set; 

Figure 6 is a schematic block diagram of an embodiment of a bridge for the system of Figure 1; 

Figure 7 is a schematic block diagram of storage for the bridge of Figure 6; 

Figure 8 is a schematic block diagram of control logic of the bridge of Figure 6; 

Figure 9 is a schematic representation of a routing matrix of the bridge of Figure 6; 

Figure 10 is an example implementation of the bridge of Figure 6; 

Figure 1 1 is a state diagram illustrating operational states of the bridge of Figure 6; 

Figure 12 is a flow diagram illustrating stages in the operation of the bridge of Figure 6; 

Figure 13 is a detail of a stage of operation from Figure 12; 

Figure 14 illustrates the posting of I/O cycles in the system of Figure 1 ; 

Figure 15 illustrates the data stored in a posted write buffer; 

Figure 16 is a schematic representation of a slot response register; 

Figure 17 illustrates a dissimilar data write stage; 

Figure 18 illustrates a modification to Figure 17; 

Figure 19 illustrates a dissimilar data read stage; 

Figure 20 illustrates an alternative dissimilar data read stage; 
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Figure 21 is a flow diagram summarising the operation of a dissimilar data write mechanism; 
Figure 22 is a schematic block diagram explaining arbitration within the system of Figure 1; 
Figure 23 is a state diagram illustrating the operation of a device bus arbiter. 
Figure 24 is a state diagram illustrating the operation of a bridge arbiter; 
Figure 25 is a timing diagram for PCI signals; 

Figure 26 is a schematic diagram illustrating the operation of the bridge of Figure 6 for direct memory 



access; 



Figure 27 is a flow d.agram illustrating a direct memory access method in the bridge of Figure 6; and 
Figure 28 is a flow diagram of a re-integration process including the monitoring of a dirty RAM. 

DESCRIPTION OF THE PREFERRED E MBODIMENTS 

Fieure 1 is a schematic overv.ew of a fault tolerant computing system 10 comprising a plurality of 
CPUsets (processing sets) 14 and 16 and a bridge 12. As shown ,n Figure 1. there are two processing sets 14 
and 16 although in other embodiments there may be three or more processing sets. The badge 12 forms an 
interface between the processus sets and I/O devices such as devices 28, 29. 30, 31 and 32. In this document, 
the term "processing set" is used to denote a group of one or more processors, possibly including memory, 
which output and receive common outputs and inputs. It should be noted that the alternative term mentioned 
above "CPUset", could be used instead, and that these terms could be used interchangeably throughout this 
document. Also, it should be noted that the term "bridge" is used to denote any device, apparatus or 
arrangement suitable for interconnecting two or more buses of the same or different types. 

The first processing set 14 is connected to the bridge 12 via a first processing set I/O bus (PA bus) 24, 
in the present instance a Peripheral Component Interconnect (PCI) bus. The second processing set 16 .s 
connected to the bridge 12 via a second processing set I/O bus (PB bus) 26 of the same type as the PA bus 24 
(i .e. here a PCI bus). The I/O devices are connected to the bridge 12 via a device I/O bus (D bus) 22. in the 

present instance also a PCI bus. 

Although, tn the particular example described, the buses 22, 24 and 26 are all PCI buses, this is merely 
by way of example, and in other embodiments other bus protocols may be used and the D-bus 22 may have a 
different protocol from that of the PA bus and the PB bus (P buses) 24 and 26. 

The processing sets 14 and 16 and the bridge 12 are operable in synchronism under the control of a 
common clock 20, which is connected thereto by clock signal lines 21. 

Some of the devices including an Ethernet (E-NET) interface 28 and a Small Computer System 
Interface (SCSI) interface 29 are permanently connected to the device bus 22, but other I/O devices such as I/O 
devices 30 31 and 32 can be hot msertable into individual switched slots 33, 34 and 35. Dynarruc field effect 
rransistor (FET) switching can be provided for the slots 33. 34 and 35 to enable hot insertability of the devices 
such as devices 30. 3 1 and 32. The provision of the FETs enables an increase in the length of the D bus 22 as 
only those devices which are active are switched on. reducing the effective total bus length. It wul be 
appreciated that the number of I/O devices which may be connected to the D bus 22. and the number of slots 
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provided for them, can be adjusted according to a particular implementation in accordance with specific design 
requirements. 

Figure 2 is a schematic overview of a particular implementation of a fault tolerant computer employing 
a bridge structure of the type illustrated in Figure 1. In Figure 2. the fault tolerant computer system include, a 
piurality (here four) of bridges 12 on first and second I/O motherboards (MB 40 and MB 42) order to increase 
the number of I/O devices which may be connected and also to improve reliability and redundancy. Thus. » 
t he embodiment shown in Figure 2. two processing sets 14 and 16 are each provided on a respective processing 
se, board 44 and 46, with the processing set boards 44 and 46 'bridging' the I/O motherboards MB 40 and MB 
4^ A first, master clock source 20A is mounted on the first motherboard 40 and a second slave clock source 
20B is mounted on the second motherboard 42. Clock signals are supplied to the processing set boards 44 and 
46 via respective connections (not shown in Figure 2). 

First and second bridges 12.1 and 12.2 are mounted on the first I/O motherboard 40. The first bridge 
p 1 is connected to the processing sets 14 and 16 by P buses 24.1 and 26.1, respectively. Sinularly, the second 
bridee P 2 is connected to the processing sets 14 and 16 by P buses 24.2 and 26.2, respectively. The bridge 
,2.1^ connected to an I/O databus (D bus) 22.1 and the bridge 12.2 is connected to an I/O databus (D bus) 

Third and fourth bridges 12.3 and 12.4 are mounted on the second I/O motherboard 42. The bridge 
p 3 is connected to the processing sets 14 and 16 by P buses 24.3 and 26.3, respectively. Similarly, the bridge 
4 is connected to the processing sets 14 and 16 by P buses 24.4 and 26.4. respectively. The bridge 12.3 .s 
connected to an I/O databus (D bus) 22.3 and the bridge 12.4 is connected to an I/O databus (D bus) 22.4. 

It can be seen that the arrangement shown in Figure 2 can enable a large number of I/O devices to be 

j i c ■ .u» r» Kiicm 22 1 22 2 22 3 and 22.4 for either increasing 
connected to the two processing sets 14 and 16 via the D buses 2Z.1, 

the range of I/O devices available, or providing a higher degree of redundancy, or both. 

Figure 3 is a schematic overview of one possible configuration of a processing set. such as the 
processing set 14 of Figure I. The processing set 16 could have the same configuration. In Figure 3, a plurahty 
of processors (here four) 52 are connected by one or more buses 54 to a processing set bus controller ,0. As 
shown in Figure 3, one or more processing set output buses 24 are connected to the processing set bus controller 
50 each processing set output bus 24 being connected to a respective bridge 12. For example, ,n the 
arrangement of Figure 1, only one processing set I/O bus (P bus) 24 would be provided, whereas .n the 
arrangement of Figure 2. four such processing set I/O buses (P buses) 24 would be provided. In the processing 
set 14 shown in Figure 3. individual processors operate using the common memory 56, and receive inputs and 
provide outputs on the common P bus(es) 24. 

Figure 4 is an alternative configuration of a processing set, such as the processing set 14 of Figure 1. 
Here a plurality of processor/memory groups 61 are connected to a common internal bus 64. Each 
processor/memory group 61 includes one or more processors 62 and associated memory 66 connected to a 
internal group bus 63. An interface 65 connects the internal group bus 63 to the common internal bus 64. 
Accordingly, in the arrangement shown in Figure 4. individual processing groups, with each of the processors 
62 and associated memory 66 are connected via a common internal bus 64 to a processing set bus controller 60. 
The interfaces 65 enable a processor 62 of one processing group to operate not only on the data ,n us local 
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memory 66. but also in the memory of another processing group 61 within the processus set 14. The 
processing set bus controller 60 provides a common interface between the common internal bus 64 and the 
processus set I/O bus(es) (P bus(es)) 24 connected to the bridge(s) 12. It should be noted that although only 
rwo processing groups 61 are shown in Figure 4, U will be appreciated that such a structure ,s no, limited to this 
5 number of processing groups. 

Figure 5 illustrates an alternative configuration of a processing set, such as the processing set 14 of 
Figure 1 Here a simple processing set includes a single processor 72 and associated memory 76 connected v.a 
a common bus 74 to a processing set bus controller 70. The processing set bus controller 70 provides an 
interface between the internal bus 74 and the processing set I/O bus(es) (P bus(es)) 24 for connection to the 
10 bridge(s) 12. 

Accordingly, it will be appreciated from Figures 3, 4 and 5 that the processing set may have many 
different forms and that the parricubr choice of a particular processing set structure can be made on the basis o. 
the processing requirement of a particular application and the degree of redundancy required. In the following 
descnption. it is assumed that the processing sets 14 and 16 referred to have a structure as shown in Figure 3. 
15 although it will be appreciated that another form of processing set could be provided. 

" The bndge(s) 12 are operable in a number of operating modes. These modes of operanon w.U be 
described in more detail later. However, to assist in a general understanding of the structure of the bndge, the 
rwo operating modes will be briefly summarized here. In a first, combined mode, a bndge 12 is operable to 
route addresses and data between the processing sets 14 and 16 (via the PA and PB buses 24 and 26, 
respectively) and the devices (via the D bus 22). In this combined mode, I/O cycles generated by the processing 
sets 14 and 16 are compared to ensure that both processing sets are operating correctly. Companson failures 
force the bridge 12 into an error limiting mode (EState) in which device I/O is prevented and diagnosnc 
information is collected. In the second, split mode, the bridge 12 routes and arbitrates addresses and data from 
one of the processing sets 14 and 16 onto the D bus 22 and/or onto the other one of the processing sets 16 and 
,4 respectively. In this mode of operation, the processing sets 14 and 16 are not synchronized and no I/O 
comparisons are made. DMA operations are also permuted in both modes. As mentioned above, the different 
modes of operation, including the combined and split modes, will be described in more detail later. However, 
there now follows a description of the basic structure of an example of the bridge 12. 

Figure 6 is a schematic functional overview of the bridge 12 of Figure 1. First and second processing 
set I/O bus interfaces, PA bus interface 84 and PB bus interface 86, are connected to the PA and PB buses 24 
and 26 respectively. A device I/O bus interface, D bus interface 82, is connected to the D bus 22. It should be 
noted that the PA. PB and D bus interfaces need not be configured as separate elements but could be 
lncorp orated in other elements of the bridge. Accordingly, within the context of this document, where a 
references is made to a bus interface, this does not require the presence of a specific separate component, but 
rather the capability of the bridge to connect to the bus concerned, for exampie by means of physical or logical 
bridge connections for the lines of the buses concerned. 

Routing (hereinafter termed a routing matrix) 80 is connected via a first internal path 94 to the PA bus 
interface 84 and via a second internal path 96 to the PB bus interface 86. The routing matrix 80 is further 
connected via a third internal path 92 to the D bus interface 82. The routing matrix 80 is thereby able to provide 
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I/O bus transaction routing in both directions between the PA and PB bus interfaces 84 and 86. It ,s also able to 
provide rouhng in both directions between one or both of the PA and PB bus interfaces and the D bus interface 
82 The routing matrix 80 is connected v,a a further internal path 100 to storage control logic 90. The storage 
control logic 90 controls access to bridge registers 110 and to a random access memory (SRAM) 126. The 
< routing matrix 80 is therefore also operable to provide routing in both directions between the PA. PB and D bus 
interfaces 84. 86 and 82 and the storage control logic 90. The routing matrix 80 i, controlled by bridge control 
logic 88 over control paths 98 and 99. The bridge control logic 88 is responsive to control s.gnals. data and 
addresses on internal paths 93. 95 and 97, and also to clock signals on the clock line(s) 21. 

In the embodiment of the invention, each of the P buses (PA bus 24 and PB bus 26) operates under a 
10 PCI protocol. The processing set bus controllers 50 (see Figure 3) also operate under the PCI protocol. 
Accordingly, the PA and PB bus interfaces 84 and 86 each provide all the functionality required for a 
compatible interface providing both master and slave operation for data transferred to and from the D bus 22 or 
interna, memories and registers of the bridge in the storage subsystem 90. The bus interfaces 84 and 86 can 
provide diagnostic information to internal bridge status registers in the storage subsystem 90 on transition of the 
1 5 bridge to an error state ( EState) or on detection of an I/O error. 

The device bus interface 82 performs all the functionality required for a PCI compliant master and 
slave interface for transferring data to and from one of the PA and PB buses 84 and 86. The D bus 82 ,s 
operable during direct memory access (DMA) transfers to provide diagnostic information to internal status 
registers in the storage subsystem 90 of the bridge on transition to an EState or on detection of an I/O error. 

Figure 7 illustrates in more detail the bridge registers 110 and the SRAM 124. The storage control 
logic 110 is connected via a path (e.g. a bus) 112 to a number of register components 114, 116. 118. 120. The 
storage control logic is also connected via a path (e.g. a bus) 128 to the SRAM 126 in which a posted wnte 
buffer component 1 22 and a dirty RAM component 124 are mapped. Although a particular configuration of the 
components 114, 116, 118, 120, 122 and 124 is shown in Figure 7, these components may be conjured m 
25 other ways, with other components defined as reg.ons of a common memory (e.g. a random access memory 
such as the SRAM 126. with the path 112/128 being formed by the internal addressing of the regions of 
memory). As shown in Figure 7. the posted write buffer 122 and the duty RAM 124 are mapped to different 
regions of the SRAM memory 126, whereas the registers 1 14. 1 16, 1 18 and 120 are configured as separate from 
the SRAM memory. 

Control and status registers (CSRs) 114 form internal registers which allow the control of vanous 
operating modes of the bridge, allow the capture of diagnostic informauon for an EState and for I/O errors, and 
control processing set access to PCI slots and devices connected to the D bus 22. These registers are set by 

signals from the routing matrix 80. 

Dissimilar data registers (DDRs) 116 prov.de locations for containing dissimilar data for different 
processing sets to enable non-deterministic data events to be handled. These registers are set by signals from 
the PA and PB buses. 

Bridge decode logic enables a common write to disable a data comparator and allow wntes to two 
DDRs 1 16, one for each processing set 14 and 16. 
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A selected one of the DDRs can then be read in-sync by the processing sets 14 and 16. The DDRs thus 
provide a mechanism enabling a location to be reflected from one processing set (14/16) to another (16/14). 

Slot response registers (SRRs) 118 determine ownership of device slots on the D bus 22 and to allow 
DMA to be routed to the appropriate processing set(s). These registers are linked to address decode logic. 

Disconnect registers 120 are used for the storage of dam phases of an I/O cycle which is aborted while 
data is in the bridge on the way to another bus. The disconnect registers 120 rece.ve all data queued in the bndge 
when a target device disconnects a transaction, or as the EState is detected. These registers are connected to the 
routtng matnx 80. The routing matrix can queue up to three data words and byte enables. Provided the inioal 
addresses are voted as being equal, address target controllers derive addresses which increment as data ,s - 
exchanged between the bridge and the destination (or target). Where a writer (for example a processor I/O write, or 
a DVMA (D bus to P bus access)) is writing data to a target, this data can be caught in the bridge when an error 
occurs Accordingly, this data is stored in the disconnect registers 120 when an error occurs. These disconnect 
registers can then be accessed on recovery from an EState to recover the data associated with the write or read cycle 
which was in progress when the EState was initiated. 

Although shown separately, the DDRs 116. the SRRs 118 and the disconnect registers may form an 

integral part of the CSRs 114. 

EState and error CSRs 1 14 provided for the capture of a failing cycle on the P buses 24 and 26, with an 
indication of the failing datum. Following a move to an EState, all of the wntes initiated to the P buses are 
togged in the posted write buffer 122. These may be other writes that have been posted in the processing set 
bus controllers 50. or which may be initiated by software before an EState interrupt causes the processors to 
stop carrying out writes to the P buses 24 and 26. 

A dirty RAM 124 is used to indicate which pages of the main memory 56 of the processing sets 14 and 

16 have been modified by direct memory access (DMA) transactions from one or more devices on the D bus 22. 

Each page (e.g. each 8K page) is marked by a single bit in the dirty RAM 124 which is set when a DMA wnte 

occurs and can be cleared by a read and clear cycle undated on die dirty RAM 124 by a processor 52 of a 

processing set 14 and 16. 

The dirty RAM 124 and the posted write buffer 1 18 may both be mapped into the memory 124 in the 
bridge 1 2. This memory space can be accessed during normal read and write cycles for testing purposes. 

Figure 8 is a schematic functional overview of the bridge control logic 88 shown in Figure 6. 

All of the devices connected to the D bus 22 are addressed geographically. Accordingly, the bridge 
carnes ou, decoding necessary to enable the isolating FETs for each slot before an access to those slots is 
initiated. 

The address decoding performed by the address decode logic 136 and 138 essentially permits four 
basic access types: 

- an out-of-sync access (i.e. not in the combined mode) by one processing set (e.g. processing set 14 of 
Figure 1) to the other processing set (e.g. processing set 16 of Figure 1), in which case the access .s routed from 
the PA bus interface 84 to the PB bus interface 86; 
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- an access by one of the processing sets 14 and 16 in the split mode, or both processing sets 14 and 16 
in the combined mode to an I/O device on the D bus 22, in which case the access is routed via the D bus 

interface 82; 

- a DMA access by a device on the D bus 22 to one or both of the processing sets 14 and 16, which 
would be directed to both processing sets 14 and 16 in die combined mode, or to the relevant processing set 14 
or 16 if out-of-ync. and if in a split mode to a processing set 14 or 16 which owns a slot in which the dev.ce ,s 

located: and 

- a PCI configuration access to devices in I/O slots. 

As mentioned above, geographic addressing is employed. Thus, for example, slot 0 on motherboard A 
has the same address when referred to by processing set 14 or by processing set 16. 

Geographic addressing is used in combination with the PCI slot FET switching. During a 
configuration access mentioned above, separate dev,ce select signals are provided for devices which are not 
FET isolated. A single device select signal can be provided for the switched PCI slots as the FET signals can be 
used to enable a correct card. Separate FET switch lines are provided to each slot for separately switching the 
FETs for the slots. 

The SRRs 1 18. which could be incorporated in the CSR registers 1 14, are associated with the address 
decode functions. The SRRs 118 serve in a number of different roles whrch will be descnbed in more detail 
later However, some of the roles are summarized here. 

In a combined mode, each slot may be disabled so that wntes are simply acknowledged without any 
transaction occurring on the device bus 22, whereby the data is lost. Reads will return meaningless data, once 
again without causing a transaction on the device board. 

In the split mode, each slot can be in one of three states. The states are: 

- Not owned; 

- Owned by processing set A 14; 

- Owned by processing set B 16. 

A slot that is not owned by a processing set 14 or 16 making an access (this includes not owned or un- 
owned slots) cannot be accessed. Accordingly, such an access is aborted. 

When a processing set 14 or 16 is powered off. all slots owned by it move to the un-owned state. A 
processing set 14 or 16 can only claim an un-owned slot, it cannot wrest ownership away from another 
processing set. This can only be done by powering off the other processing set, or by getting the other 
processing set to relinquish ownership. 

The ownership bits are assessable and senable while in the combined mode, but have no effect until a 
split state is entered. This allows the configuration of a split system to be determined while still in the 
combined mode. 

Each PCI device is allocated an area of the processing set address map. The top bits of the address are 
determined by the PCI slot. Where a device cames out DMA, the bridge is able to check that the device is 
us,ng the correct address because a D bus arbiter informs the bridge which device is using the bus at a pamcular 
time If a device access is a processing set address which is not valid for it, then the device access w.ll be 
ignored It should be noted that an address presented by a device will be a virtual address which would be 
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translated by an I/O memory management unit in the processing set bus controller 50 to an actual memory 
address. 

The addresses output by the address decoders are passed via the initiator and target controllers 138 and 
140 to the routing matnx 80 via the lines 98 under control of a bndge controller 132 and an arbiter 134. 

An arbiter 134 is operable in various different modes to arbttrate for use of the bridge on a first-come- 
first-served basis using conventional PCI bus signals on the P and D buses. 

In a combined mode, the arbiter 134 is operable to arbitrate between the in-sync processing sets 14 and 
16 and any initiators on the device bus 22 for use of the bridge 12. Possible scenarios are: 

- processing set access to the device bus 22; 

- processing set access to internal registers in the bridge 12; 

- Device access to the processing set memory 56. 

In split mode, both processing sets 14 and 16 must arbitrate the use of the bridge and thus access to the 
dev.ce bus 22 and internal bridge registers (e.g. CSR regtsters 114). The bridge 12 must also contend with 
initiators on the device bus 22 for use of that device bus 22. 

Each slot on the device bus has an arbitration enable bit associated with it. These arbitration enable 
bits are cleared after reset and must be set to allow a slot to request a bus. When a device on the device bus 22 
is suspected of providing an I/O error, the arbitration enable bit for that device is automatically reset by the 
bridge. 

A PCI bus interface in the processing se, bus controllers) 50 expects to be the master bus controller 
for the P bus concerned, that is it contains the PCI bus arbiter for the PA or PB bus to which it is connected. 
The bridge 12 cannot directly control access to the PA and PB buses 24 and 26. The bridge 12 competes for 
access to the PA or PB bus with the processing set on the bus concerned under the control of the bus controller 

50 on the bus concerned. . 

Also shown in Figure 8 is a comparator 130 and a bridge controller 132. The comparator 130 ,s 
operable to compare VO cycles from the processing sets 14 and 16 to determine any out-of-sync events. On 
determining an out-of-sync event, the comparator 130 is operable to cause the bridge controller 132 to activate 
an EState for analysis of the out-of-sync event and possible recovery therefrom. 

Figure 9 is a schematic functional overview of the routing matrix 80. 

The routing matrix 80 comprises a multiplexer 143 which is responsive to initiator control signals 98 
from the inmator controller 138 of Figure 8 to select one of the PA bus path 94 , PB bus path 96, D bus path 92 
or internal bus path 100 as the current input to the routing matrix. Separate output buffers 144, 145, 146 and 
,47 are provided for output to each of the paths 94, 96, 92 and 100, with those buffers being selectively enabled 
by Slg nals 99 from the target controller 140 of Figure 8. Between the multiplexer and the buffers 144-147 
signals are held in a buffer 149. In the present embodiment three cycles of data for an I/O cycle will be held in 
the pipeline represented by the multiplexer 143, the buffer 149 and the buffers 144. 

In Figures 6 to 9 a functional description of elements of the bndge has been given. Figure 10 is a 
schematic representation of a physical configuration of the bridge in which the bridge control logic 88. the 
storage control logic 90 and the bridge registers 1 10 are implemented in a first field programmable gate array 
(FPGA) 89. the routing matrix 80 is implemented in further FPGAs 80.1 and 80.2 and the SRAM 126 is 
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implemented as one or more separate SRAMs addressed by a address control line. 127. The bus interfaces 82. 
84 and 86 shown in Figure 6 are not separate elements, but are integrated in the FPGAs 80.1, 80.2 and 89. Two 
FPGAs 80 1 and 80.2 are used for the upper 32 bits 32-63 of a 64 bit PCI bus and the lower 32 bits 0-3 1 of the 
64 bit PCI bus. It will be appreciated that a single FPGA could be employed for the routing matnx 80 where 
the necessary logic can be accommodated within the device. Indeed, where a FPGA of sufficient capacity ,s 
available the bridge control logic, storage control logic and the bridge registers could be incorporated m the 
same FPGA as the routing matrix. Indeed many other configurations may be envisaged, and indeed technology 
other than FPGAs, for example one or more Application Specific Integrated Circuits (ASICs) may be 
empioyed. As shown in Figure 10, the FPGAs 89, 80.1 and 80.2 and the SRAM 126 are connected v,a internal 

bus paths 85 and path control lines 87. 

Figure H is a transition diagram illustrating in more detail the various operating modes of the bridge. 
The bndge operation can be divided into three basic modes, namely an error state (EState) mode 150. a split 
state mode 156 and a combined state mode 158. The EState mode 150 can be further divided into 2 states. 

After initial resetting on powering up the bridge, or following an out-of sync event, the bridge is in th,s 
inltl al EState 152. In this state, all writes are stored in the posted write buffer 120 and reads from the internal 
bridge registers (e.g., the CSR registers 1 16) are allowed, and all other reads are treated as errors (i.e. they are 
aborted, In tins state, the individual processing sets 14 and 16 perform evaluations for determining a restart 
time Each processing set 14 and 16 will determine its own restart timer timing. The timer setting depends on a 
"blame" factor for the transition to the EState. A processing set which determines that it is likely to have caused 
the error sets a long time for the time, A processing set which thinks it unlikely to have caused the error sets a 
short time for the timer. The first processing set 14 and 16 which times out, becomes a primary processus set. 
Accordingly, when this is determined, the bridge moves (153) to the primary EState 154. 

When either processing set 14/16 has become the primary processing set, the bridge is then operating 
in the primary EState 154. This state allows the primary processing set to write to bridge registers (specifically 
the SRRs 1 18). Other writes are no longer stored in the posted write buffer, but are simply lost. Device bus 
reads are still aborted in the primary EState 154. 

Once the EState condition is removed, the bridge then moves (155) to the split state 156. In the split 
state 156 access to the device bus 22 is controlled by the SRE. registers 1 1 8 while access to the bridge storage is 
s.mply arbitrated. The pnmary status of the processing sets 14 and 16 is ignored. Transition to a combined 
operation is achieved by means of a sync.reset (157). After issue of the sync_reset operation, the bndge .s then 
operable in the combined state 158. whereby all read and write accesses on the D bus 22 and the PA and PB 
buses 24 and 26 are allowed. All such accesses on the PA and PB buses 24 and 26 are compared in the 
comparator 130. Detection of a mismatch between any read and write cycles (with an exception of specific 
d,ss,milar data I/O cycles) cause a transition 151 to the EState 150. The various states described are controlled 

by the bridge controller 132. 

The role of the comparator 130 is to monitor and compare I/O operations on the PA and PB buses in 
.he combined state 1 5 1 and, in response to a mismatched signal, to notify the bridge controller 132. whereby the 
bndge controller 132 causes the transition 152 to the error state 150. The I/O operations can include all I/O 
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11 as DMA transfers in respect of DMA initiated by a device on 



operations initiated by the processing sets, as we 
the device bus. 

Table 1 below summarizes the various access operations which are allowed in each of the operational 

states 



TABLE 1 



E State 
Primary EState 
Split 

Combined 



D Bus - Read 
Master Abort 
Master Abort 
Controlled by SRR bits 
and arbitrated 
Allowed and compared 



D Bus-Write 
Stored in Post Write Buffer 
Lost 

Controlled by SRR bits 
and arbitrated 
Allowed and compared 



As described above, after an initial reset, the system is in the initial EState 152. In this state, neither 
processing sets 14 or 16 can access the D bus 22 or the P bus 26 or 24 of the other processing set 16 or 14. The 
internal bridee registers 1 16 of the bridge are accessible, but are read only. 

A system running in the combined mode 158 transitions to the EState 150 where there is a comparison 
failure detected in this bridge, or alternatively a comparison failure is detected in another bridge in a muhv 
bridge system as shown, for example, in Figure 2. Also, transitions to an EState 150 can occur in other 
stations, for example in the case of a software controlled event forming part of a self test operanon. 

On moving to the EState 150, an interrupt is signaled to all or a subset of the processors of the 
processing sets via an interrupt line 95. Following this, all VO cycles generated on a P bus 24 or 26 result in 
reads beine returned with an exception and writes being recorded in the posted write buffer. 

The operation of the comparator 130 will now be described in more detail. The comparator ,s 
connected to paths 94, 95, 96 and 97 for comparing address, data and selected control signals from the PA and 
PB bus interfaces 84 and 86. A failed comparison of in-sync accesses to device I/O bus 22 devices causes a 
move from the combined state 158 to the EState 150. 

For processing set I/O read cycles, die address, command, address parity, byte enables and parity error 

parameters are compared. 

If the comparison fails during the address phase, the bridge asserts a retry to the processing set bus 
controllers 50, which prevents data leaving the I/O bus controllers 50. No acuvity occurs in this case on the 
device I/O bus 22. On the processor(s) retrying, no error is returned. 

If the comparison fails during a data phase (only control signals and byte enables are checked), the 
bridle signals a target-abort to the processing set bus controllers 50. An error is returned to the processors. 

In the case of processing set I/O bus write cycles, the address, command, parity, byte enables and data 
parameters are compared. 
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If the comparison fails during the address phase, the bridge asserts a retry to the processing set bus 
controllers 50, which results in the processing set bus controllers 50 retrying the cycle again. The posted write 
buffer 122 is then active. No activity occurs on the device I/O bus 22. 

If the comparison fails during the data phase of a write operation, no data is passed to the D bus 22. 
The failing data and any other transfer attributes from both processing sets 14 and 16 are stored in the 
disconnect registers 122. and any subsequent posted write cycles are recorded in the posted write buffer 118. 

In the case of direct virrual memory access (DVMA) reads, the data control and parity are checked for 
each datum. If the data does not match, the bridge 12 terminates the transfer on the P bus. In the case of 
DVMA writes, control and parity error signals are checked for correctness. 

Other signals in addition to those specifically mentioned above can be compared to give an indication 
of divergence of the processing sets. Examples of these are bus grants and various specific signals during 
processing set transfers and during DMA transfers. 

Errors fall roughly into two types, those which are made visible to the software by the processing set 
bus controller 50 and those which are not made visible by the processing set bus controller 50 and hence need to 
be made visible by an interrupt from the bridge 12. Accordingly, the bridge is operable to capture errors 
reported in connection with processing set read and write cycles, and DMA reads and writes. 

Clock control for the bridge is performed by the bridge controller 132 in response to the clock signals 
from the clock line 21. Individual control lines from the controller 132 to the various elements of the bridge are 

not shown in Figures 6 to 10. 

Figure 12 is a flow diagram illustrating a possible sequence of operating stages where lockstep errors 

are detected during a combined mode of operation. 

Stage SI represents the combined mode of operation where lockstep error checking is performed by 

the comparator 130 shown in Figure 8. 

In Stage S2, a lockstep error is assumed to have been detected by the comparator 130. 

In Stage S3, the current state is saved in the CSR registers 114 and posted writes are saved in the 
posted write buffer 122 and/or in the disconnect registers 120. 

Figure 13 illustrates Stage S3 in more detail. Accordingly, in Stage S31, the bridge controller 132 
detects whether the lockstep error notified by the comparator 130 has occurred during a data phase in which it is 
possible to pass data to the device bus 22. In this case, in Stage S32, the bus cycle is terminated. Then, in Stage 
S33 the data phases are stored in the disconnect registers 120 and control then passes to Stage S35 where an 
evaluation is made as to whether a further I/O cycle needs to be stored. Alternatively, if at Stage S31, it is 
determined that the lockstep error did not occur during a data phase, the address and data phases for any posted 
write VO cycles are stored in the posted write buffer 122. At Stage S34, if there are any further posted write I/O 
operations pending, these are also stored in the posted write buffer 122. 

Stage S3 is performed at the initiation of the initial error state 152 shown in Figure 1 1. In this state, the 
first and second processing sets arbitrate for access to the bridge. Accordingly, in Stage S31-S35, the posted 
wnte address and data phases for each of the processing sets 14 and 16 are stored in separate portions of the 
posted write buffer 122, and/or in the single set of disconnect registers as described above. 
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Fl gure 14 illustrates the source of the posted write I/O cycles which need «o be stored in the posted 
wnte buffer 122. Duong normal operanon of the processtng sets 14 and 16, output buffers 162 in the individual 
processors contain I/O cycles which have been posted for transfer v.a the processing set bus controllers 50 to 
th e bndge 12 and eventually to the device bus 22. Similarly, buffers 160 in the processing set controllers ,0 
also contain posted I/O cycles for transfer over the buses 24 and 26 to the bndge 12 and eventually to the device 
bus 22. 

accordingly, it can be seen that when an error state occurs, I/O wnte cycles may already have been 
posted by the processors 52, either in their own buffers 162, or already transferred to the buffers 160 of the 
processus set bus controllers 50. It is the I/O wnte cycles in the buffers 162 and 160 which gradually 
propagate through and need to be stored in the posted write buffer 122. 

As shown in Figure 15, a write cycle 164 posted to the posted wnte buffer 122 can comprise an 
address field 165 including an address and an address type, and between one and 16 data fields 166-including a 

byte enable field and the data itself. 

The data ,s written into the posted write buffer 122 in the EState unless the initiating processtng set has 
been designated as a pnmary CPU set. At that time. no„- P nmary wntes in an EState still go to the posted wnte 
buffer even after one of the CPU sets has become a primary processing set. An address pointer m the CSR 
renters 1 14 points to the next available posted wnte buffer address, and also provides an overflow b,t whtch is 
set when the bridge attempts to write past of the top of the posted write buffer for any one of the processtng sets 
14 and 16 Indeed, in the present implementation, only the first 16 K of data is recorded in each buffer. 
Attempts to write beyond the top of the posted write buffer are ignored. The value of the posted write buffer 
potnter can be cleared at reset, or by software using a write under the control of a primary processing set. 

Returning to Figure 12, after saving the status and posted writes, at Stage S4 the individual processing 
sets independently seek to evaluate the error state and to determine whether one of the processing sets is faulty. 
This deterrrunarion ,s made by the individual processors in an error state in which they individually read status 
from the control state and EState registers 1 14. During this error mode, the arbiter 134 arbitrates for access to 
the bridge 12. 

In Stage S5, one of the processing sets 14 and 16 establishes itself as the pnmary processing set. Th.s 
is determined by each of the processing sets identifying a tune factor based on the estimated degree of 
responsibility for the error, whereby the first processing set to time out becomes the primary processing set. In 
Stage S5. the status is recovered for that processing set and is copied to the other processing set. The primary 
processing is able to access the posted write buffer 122 and the disconnect registers 120. 

In Stage S6, the bridge is operable in a split mode. If it is possible to re-establish an equivalent status 
for the first and second processing sets, then a reset is issued at Stage S7 to put the processing sets » the 
combined mode at Stage SI. However, it may not be possible to re-establish an equivalent state until a faulty 
processing set is replaced. Accordingly the system will stay in the Split mode of Stage S6 in order to continued 
operation based on a single processing set. After replacing the faulty processing set the system could then 
establish an equivalent state and move via Stage S7 to Stage SI. 

As descnbed above, the comparator 130 is operable in the combined mode to compare the I/O 
operations output by the first and second processing sets 14 and 16. Tnis is fine as long as all of the I/O 
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operations of the first and second processing sets 14 and 16 arc folly synchronized and determine. Any 
deviation from this will be interpreted by the comparator 130 as a loss of lockstep. This is in principle correct 
as even a minor deviation from identical outputs, if not trapped by the comparator 130, could lead to the 
processus sets diverging further from each other as the individual processing sets act on the deviating outputs. 
However, a strict application of this puts s.gnificant constraints on the design of the individual processmg sets. 

examole of this is that it would not be possible to have independent time of day clocks in the individual 
processing sets operating under their own clocks. This is because it is impossible to obtain two crystals which 
are 100% identical in operation. Even small differences in the phase of the clocks could be critical as to 
whether the same sample is taken at any one time, for example either side of a clock transition for the respective 
processing sets. 

Accordingly, a solution to this problem employs the dissimilar data registers (DDR) 116 menrioned 
earlier. The solution is to write data from the processing sets into respective DDKs in the bridge while disabling 
the comparison of the data phases of the write operations and then to read a selected one of the DDRs back to 
each processing set, whereby each of the processing sets is able to act on the same data. 

Figure 17 is a schematic representation of details of the bridge of Figures 6 to 10. It will be noted that 
details of the bridge not shown in Figure 6 to 8 are shown in Figure 17, whereas other details of the bridge 
shown in Figures 6 to 8 are not shown in Figure 17, for reasons of clarity. 

The DDRs 1 16 are provided in the bridge registers 1 10 of Figure 7, but could be provided elsewhere in 
the bridge in other embodiments. One DDR 1 16 is provided for each processing set. In the example of the 
multi-processor system of Figure 1 where two processing sets 14 and 16 are provided, two DDRs 116A and 
1 16B are provided, one for each of the first and second processing sets 14 and 16, respectively. 

Figure 17 represents a dissimilar data write stage. The addressing logic 136 is shown schematically to 
comprise two decoder sections, one decoder section 136A for the first processing set and one decoder section 
13 6B for the second processing set 16. During an address phase of a dissimilar data I/O write operation each of 
the processing sets 14 and 16 outputs the same predetermined address DDR-W which is separately interpreted 
by the respective first and second decoding secnons 136A and 136B as addressmg the respective first and 
second respective DDRs 1 16A and 1 16B. As the same address is output by the first and second processing sets 
14 and 16, this is not interpreted by the comparator 130 as a lockstep error. 

The decoding section 136A, or the decoding section 136B, or both are arranged to further output a 
disable signal 137 in response to the predetermined write address supplied by the first and second processing 
sets 14 and 16. This disable signal is supplied to the comparator 130 and is operative during the data phase of 
the write operation to disable the comparator. As a result, the data output by the first processing set can be 
stored in the first DDR 1 16A and the data output by the second processing set can be stored in the second DDR 
1 16B without the comparator being operative to detect a difference, even if the data from the first and second 
processing sets is different. The first decoding section is operable to cause the routing matrix to store the data 
from the first processing set 14 in the first DDR 1 16A and the second decoding section is operable to cause the 
routine matrix to store the data from the second processmg set 16 in the second DDR 1 16B. At the end of the 
data phase the comparator 130 is once again enabled to detect any differences between I/O address and/or data 
phases as indicative of a lockstep error. 
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Follows the writing of die dissimilar data to the first and second DDRs 116A and U6B. the 
processing sets are then operable to read the data from a selected one of the DDRs 1 16A/1 16B. 

Figure 13 illustrates an alternative arrangement where the disable signal 137 is negated and ,s used to 
control a gate 131 at the output of the comparator 130. When the disable signal is active the output of the 
comparator ,s d.sabled. whereas when the disable signal ts inactive the output of the comparator is enabled. 

Figure 19 illustrates the reading of the first DDR 1 16A in a subsequent dissimilar data read stage. As 
Ulusrrated in Figure 19. each of the processing sets 14 and 1 6 outputs the same predetermined address DDR-RA 
which ,s separately interpreted by the respective first and second decoding secrions 136A and 136B as 
addressing the same DDR, namely the first DDR 1 16A. As a result, the content of the first DDR 1 16A is read 
bv both of the processing sets 14 and 16. thereby enabling those processing sets to receive the same data. Th>s 
enables the rwo processing set* 14 and 16 to achieve deterrmnistic behavior, even if the source of the data 
written into the DDRs 1 16 by the processing sets 14 and 16 was not deterministic. 

As an alternative, the processing sets could each read the data from the second DDR 1 16B. F.gure 20 
..lustrates the readme of the second DDR 1 16B in a d.ssimilar data read stage following the dissimilar data 
wrue staae of Figure 15. As illustrated in Figure 20. each of the processing sets 14 and 16 outputs the same 
predetermined address DDR-RB which is separately interpreted by the respective first and second decoding 
sections 136A and 136B as addressing the same DDR. namely the second DDR 1 16B. As a result, the content 
of the second DDR 116B is read by bom of the processing sets 14 and 16, thereby enabiing those processing 
sets to receive the same data. As with the dissimilar data read stage of Figure 16, this enables the two 
processing sets 14 and 16 to achieve deterministic behavior, even if the source of the data written mto the DDRs 
1 16 by the processing sets 14 and 16 was not deterrninistic. 

The selection of which of the first and second DDRs 1 16A and 1 16B to be read can be determined in 
any appropriate manner by the software operating on the processing modules. Tins could be done on the basis 
of a simple seiect.on of one or the other DDRs, or on a statistical basis or randomly or in any other manner as 
long as the same choice of DDR is made by both or all of the processing sets. 

Figure 21 is a flow diagram summarizing the various stages of operation of the DDR mechan.sm 

described above. 

In stage S10, a DDR write address DDR-W ls received and decoded by the address decoders sections 
136A and 136B during the address phase of the DDR write operation. 
In stage SI 1, the comparator 130 is disabled. 

In stage S12. the data received from the processing sets 14 and 16 during the data phase of the DDR 
write operation is stored in the first and second DDRs 1 16A and 1 16B. respectively, as selected by the first and 
second decode sections 136A and 136B, respectively. 

In stage SI 3, a DDR read address is received from the first and second processing sets and is decoded 

by the decode sections 136A and 136B, respectively. 

If the received address DDR-RA is for the first DDR 1 16A. then in stage S14 the content of that DDR 

116A is read by both of the processing sets 14 and 16. 

Alternatively. 116A if the received address DDR-RB is for the second DDR 116B, then in stage S15 
che content of that DDR 1 16B is read by both of the processing sets 14 and 16. 

16 



WO 99/66402 PCT7US99/12429 

Figure 22 is a schematic representation of the arbitration performed on the respective buses 22, 24 and 

~>6 and the arbitration for the bridge itself. 

Each of the processing set bus controllers 50 in the respective processing sets 14 and 16 includes a 
convention.! PCI master bus arbtter 180 for providing arbitration to the respective buses 24 and 26. Each of the 
m3 ster arbtters 180 is responsive to request signals from the associated processtng set bus contro.ler ,0 and the 
bndee 12 on respective request (REQ) lines 181 and 182. The master arbiters 180 allocate access to the bus on 
a first-come-first-served basis, issuing a grant (GNT) signal to the winning party on an approbate grants hne 
183 or 184. 

* conventional PCI bus arbiter 185 proves arbitration on the D bus 22. The D bus arbtter 185 can be 
conjured as part of the D bus interface 82 of Figure 6 or could be separate therefrom. As w,th the P bus 
maste'r arbiters 180, the D bus arbiter is responsive to request signals from the contending devtces, tncludtng the 
bridge and the devices 30. 31. etc. connected to the device bus 22. Respective request lines 186, 187, 188, etc. 
for each of the entities compering for access to the D bus 22 are provided for the request signals (REQ). The D 
bus arbiter 185 allocates access to the D bus on a first-come-first-served basis, issuing a grant (GNT) signal to 
the winning entity via respective grant lines 189, 190. 192, etc. 

Figure 23 is a state diagram summansing the operation of the D bus arbtter 185. In a particular 
embodiment up to six request signals may be produced by respective D bus devices and one by the bridge ttself. 
On a transition into the GRANT state, these are sorted by a priority encoder and a request signal (REQ#) w«h 
the highest pnority is registered as the winner and gets a grant (GNT#) signal. Each winner which is selected 
modifies the priorities in a priority encoder so that given the same REQ# signals on the next move to grant. A 
different device has the highest priority, hence each device has a "fair" chance of accessing DEV, The bridge 
REQ# has a higher weighting than D bus devices and will, under very busy conditions, get the bus for every 

second device. . 

If a device requesting the bus fails to perform a transaction within 16 cycles it may lose GNT# via the 
BACKOFF state. BACKOFF is required as. under PCI rules, a device may access the bus one cycle after GNT. 
is removed. Devices may only be granted access to D bus if the bndge is not in the not tn the EState. A new 
GNT# is produced at the times when the bus is idle. 

In the GRANT and BUSY states, the FETs are enabled and an accessing device is known and 
forwarded to the D bus address decode logic for checking against a DMA address provided by the device. 

Turning now to the bridge arbiter 134, this allows access to the bridge for the first device which asserts 
the PCI FRAME# signal indicating an address phase. Figure 24 is a state diagram summarising the operation of 
the bridge arbiter 134. 

As with the D bus arbiter, a priority encoder can be provided to resolve access attempts which collide. 
In this case «. collision" the loser/losers are retried which forces them to give up the bus. Under PCI rules 
retried devices must try repeatedly to access the bridge and this can be expected to happen. 

To prevent devices which are very quick with their retry attempt from hogging the bndge, retned 
interfaces are remembered and assigned a higher priority. These remembered retries are prioritised in the same 
way as address phases. However as a precaution this mechanism is timed out so as not to get stuck waiting for a 
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faulty or dead device. The algorithm employed prevents a dev 1C e which hasn't yet been retned, but wh.ch 
would be a higher prionty retry than a device currently waiting for, from being retried at the fust attempt. 

In combined operations a PA or PB bus input selects which P bus interface will win a bndge access. 
Both are informed they won. Allowed selection enables latent fault checking dunng normal operation. EState 

prevents the D bus from winning. 

The bndge arbiter 134 is responsive to standard PCI signals provided on standard PCI control l.nes 22. 

24 and 25 to control access to the bridge 12. 

Figure 25 illustrates signals associated with an I/O operation cycle on the PCI bus. A PCI frame signal 
(FRAME*) is initially asserted. At the same rime, address (A) signals will be available on the DATA BUS and 
the appropriate command (wnte/read) signals (C) will be available on the command bus (CMD BUS). Shortly 
after the frame signal being asserted low, the initiator ready signal (IRDY#) will also be asserted low. When the 
device responds, a device selected signal (DEVSEL#) will be asserted low. When a target ready Signal is 
asserted low (TRDY#), data transfer (D) can occur on the data bus. 

The bridge is operable to allocate access to the bridge resources and thereby to negotiate allocation of a 
target bus in response to the FRAMES being asserted low for the initiator bus concerned. Accordingly, the 
bndge arbiter 134 is operable to allocate access to the bridge resources and/or to a target bus on a first-come- 
first-served basis in response to the FRAMES being asserted low. As well as the simple first-come-first-served 
bas.s the arbiters may be additionally provided with a mechanism for logging the arbitration requests, and can 
imply a conflict resolution based on the request and allocation history where two requests are received at an 
identical time. Alternatively, a simple priority can be allocated to the various requesters, whereby, in the case 
of identically timed requests, a particular requester always wins the allocation process. 

Each of the slots on the device bus 22 has a slot response register (SRR) 118, as well as other devices 
connected to the bus, such as a SCSI interface. Each of the SRRs 1 18 contains bits defining the ownership of 
th e slots or the devices connected to the slots on the direct memory access bus. In this embodiment, and for 
reasons to be elaborated below, each SRR 1 18 comprises a four bit register. However, it will be appreciated 
that a larger register will be required to determine ownership between more than two processing sets. For 
example, if three processing sets are provided, then a five bit register will be required for each slot. 

Figure 16 illustrates schematically one such four bit register 600. As shown in Figure 16. a first bit 
602 is identified as SRR[0], a second bit 604 is identified as SRR[1], a third bit 606 is identified as SRR[2] and 
a fourth bit 608 is identified as SRR[3]. 

Bit SRR[0] is a bit which is set when writes for valid transactions are to be suppressed. 
Bit SRR[1] is set when the device slot is owned by the first processing set 14. This defines the access 
route between the first processing set 14 and the device slot. When this bit is set, the first processing set 14 can 
always be master of a device slot 22, while the ability for the device slot to be master depends on whether bit 
SRR[3] is set. 

Bit SRR[2] is set when the device slot is owned by the second processing set 16. This defines the 
access route between the second processing set 16 and the device slot. V/hen this bit is set, the second 
. processing se, 16 can always be master of the device slot or bus 22, while the ability for the device slot to be 

master depends on whether bit SRR[3] is set. 
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Bit SRR[3] is an arbitration bit which gives the dev.ee slot the ability to become master of the dev.ee 
bus 22. but only if it is owned by one of the processing sets 14 and 16, that is if one of the SRR [1] and SRR[2] 

bits is set. 

When the fake bit (SRR[0]) of an SRR 1 18 is set, writes to the dev.ee for that slot are ignored and do 
no. appear on the dev.ee bus 22. Reads return mdetermmate data w.thout causing a transaction on the device 
bus 2? in the event of an I/O error the fake bit SRR[0] of the SRR 188 corresponding to the dev.ee which 
caused the error is set by the hardware conf.gurar.on of the bridge to disable further access to the device slot 
concerned. An interrupt may also be generated by the bridge to inform the software which originated the access 
leading to the I/O error that the error has occurred. The fake bit has an effect whether the system is in the spl.t 
or the combined mode of operation. 

The ownership bits only have effect, however, in the spl.t system mode of operation. In this mode. 

each slot can be in three states: 
Not-owned; 

Owned by processing set 14; and 
Owned by processing set 16 

This is determined by the two SRR bits SRR[1] and SRR[2], with SRR[1] being set when the slot is 
owned by processing set 14 and SRR[2] being set when the slot is owned by processing set B. If the slot .s un- 
owned then neither bit is set (both bits set is an illegal condition and is prevented by the hardware). 

A slot which is not owned by the processing set making the access (this includes un-owned slots) 
cannot be accessed and results in an abort. A processing set can only claim an un-owned slot; it cannot wrest 
ownership away from another processing set. This can only be done by powering-off the other processus set. 
When a processing set is powered off, all slots owned by it move to the un-owned state. Whilst it is not 
poss.ble for a processing set to wrest ownership from another processing set, it is possible for a processing set to 

give ownership to another processing set. 

The owned bits can be altered when in the combined mode of operation state but they have no effect 

until the split mode is entered. 

Table 2 below summarizes the access rights as deterrnined by an SRR 1 1 8. 

From Table 2, it can be seen that when the 4-bit SRR for a given device is set to 1 100, for example, 
.hen the slot ,s owned by processing set B (i.e. SRR[2] is logic high) and processtng set A may not read from or 
write to the device (i.e. SRR[1] is logic low), although it may read from or write to the bridge. "FAKEAT .s 
set logic low (i.e. SRR[0] is logic low) indicating that access to the device bus is allowed as there are no faults 
on the bus As "ARB EN" is set logic high (i.e. SRR[3] is logic high), the device with which the reg.ster » 
associated can become master of the D bus. This example demonstrates the operation of the register when the 
bus and associated devices are operating correctly. 
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10 



15 



20 



25 



30 



SRR 
[3[2J[1][0] 

0000 
xOOx 

0010 



0100 



PA BUS 



TABLE 2 
PB BUS 



Device Interface 



1010 



100 



0011 



0101 



1011 



1101 



Read/Write bridge SRR Read/Write bridge SRR Access denied 



Read/Write bridge 
Owned D Slot 

Read/Write bridge 
No access to D Slot 

Read/Write bridge, 
Owned D Slot 

Read/Write bridge, 
No access to D Slot 

Read/Write bridge, 
Bridge discard writes 

Read/Write bridge, 
No access to D slot 

Read/Write bridge, 
Bridge discard writes 

Read/Write bridge, 
No access to D slot 



Read/Write bridge 
No access to D Slot 

Read/ write bridge 
Access to D Slot 

Read/Write Bridge 
No access to D Slot 

Read/Write bridge 
Access to D Slot 

Read/Write bridge 
No access to D Slot 

Read/Write bridge 
Bridge discards writes 

Read/Write bridge 
No access to D Slot 

Read/Write bridge 
Bridge discards writes 



Access Denied because 
arbitration bit is off 

Access Denied because 
arbitration bit is off 

Access to CPU B Denied 
Access to CPU A OK 

Access to CPU A Denied 
Access to CPU B OK 

Access Denied because 
Arbitration bit is off 

Access Denied because 
Arbitration bit is off 

Access to CPU B Denied 
Access to CPU A OK 

Access to CPU B Denied 
Access to CPU A OK 



In an alternative example, where the SRR for the device is set to 0101, the setting of SRR[2] logic high 
35 indicates that the device is owned by processing set B. However, as the device is malfunctioning, SRR[3] is set 
loaic low and the device is not allowed access to the processing set. SRR[0] is set high so that any wntes to the 
device are ignored and reads therefrom return indeterminate data. In this way, the malfunctioning device is 
effectively isolated from the processing set, and provtdes indeterminate data to satisfy any device drivers, for 
example, that might be looking for a response from the device. 
40 Figure 26 illustrates the operation of the bridge 12 for direct memory access by a device such as one of 

the devices 28, 29, 30, 31 and 32 to the memory 56 of the processing sets 14 and 16. When the D bus arbiter 
185 receives a direct memory access (DMA) request 193 from a device (e.g., device 30 in slot 33) on <h* device 
bus the D bus arbiter determines whether to allocate the bus to that slot. As a result of this granting procedure, 
the D-bus arbiter knows the slot which has made the DMA request 193. The DMA request is supplied to the 
45 address decoder 142 in the bridge, where the addresses associated with the request are decoded. The address 
decoder is responsive to the D bus grant signal 194 for the slot concerned to identify the slot which has been 
granted access to the D bus for the DMA request. 

The address decode logic 142 holds or has access to a geographic address map 196, which identifies 
the relationship between the processor address space and the slots as a result of the geographic address 
50 employed. This geographic address map 196 could be held as a table in the bridge memory 126. along wuh the 
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posted write buffer 122 and the dirty RAM 124. Alternatively, it could be held as a table in a separate memory 
element, possibly forming part of the address decoder 142 itself. The map 182 could be configured in a form 

other than a table. 

The address decode logic 142 is configured to verify the correctness of the DMA addresses supplied by 
the dev,ce 30. In one embodiment of the invention, this is achieved by comparing four significant address bits 
of the address supplied by the device 30 with the corresponding four address bits of the address held in the 
geographic addressing map 196 for the slot identified by the D bus grant signal for the DMA request. In thts 
example four address bits are sufficient to determine whether the address supplied is within the correct address 
ranue In this specific example, 32 bit PCI bus addresses are used, with bits 3 1 and 30 always beuig set to 1 , bit 
,9 teing allocated to identify which of two bridges on a motherboard is being addressed (see Figure 2) and bits 
28 to 26 identifying a PCI device. Bits 25-0 define an offset from the base address for the address range for 
each slot. According, by comparing bits 29-26, it is possible to identify whether the address(es) supplied 
fall(s) within the appropriate address range for the slot concerned. It will be appreciated that in other 
embodiments a different number of bits may need to be compared to make this determination depending upon 

the allocation of the addresses. 

The address decode logic 142 could be arranged to use the bus grant signal 184 for the slot concerned 
to identify a table entry for the slot concerned and then to compare the address in that entry with the address(es) 
received with the DMA request as described above. Alternatively, the address decode logic 142 could be 
arranged to use the address(es) received with the DMA address to address a relational geographic address map 
and to determine a slot number therefrom, which could be compared to the slot for which the bus grant signal 
194 is intended and thereby to determine whether the addresses fall within the address range appropriate for the 
slot concerned. 

Either way, the address decode logic 142 is arranged to permit DMA to proceed if the DMA addresses 
fall within the expected address space for the slot concerned. Otherwise, the address decoder is arranged to 
ignore the slots and the physical addresses. 

The address decode logic 142 is further operable to control the routing of the DMA request to the 
appropriate processing set(s) 14/16. If the bridge is in the combined mode, the DMA access will automatically 
be allocated to all of the in-sync processing sets 14/16. The address decode logic 142 will be aware that the 
bridge is in the combined mode as it is under the control of the bridge controller 132 (see Figure 8). However, 
where the bridge is in the split mode, a decision will need to be made as to which, if any, of the processing sets 

the DMA request is to be sent. 

When the system is in split mode, the access will be directed to a processing set 14 or 16 which owns 
the slot concerned. If the slot is un-owned, then the bridge does not respond to the DMA request. In the split 
mode the address decode logic 142 is operable to determine the ownership of the dev 1C e originating the DMA 
request by accessing the SRR 1 1 8 for the slot concerned. The appropriate slot can be identified by the D bus 
grant signal. The address decode logic 142 is operable to control the target controller 140 (see Figure 8) to pass 
,he DMA request to the appropriate processing set(s) 14/16 based on the ownership bits SRR[1] and SRR[2]. If 
bit SRR[1] is set, the first processing set 14 is the owner and the DMA request is passed to the first processing 
set If bit SRR[2] is set, the second processing set 16 is the owner and the DMA request is passed to the second 
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processmg set. If neither of the bit SRR[1] and SRR[2] is set then the DMA request is ignored by the address 
decoder and is not passed to either of the processing sets 14 and 16. 

Figure 27 is a flow diagram sommanzing the DMA venficarion process as illustrated with reference to 

Figure 24. 

In stage S20. the D-bus arbiter 160 arbitrates for access to the D bus 22. 

In stage S21. the address decoder 142 verifies the DMA addresses supplied with the DMA request by 

accessing the geographic address map. 

In stage S22, the address decoder ignores the DMA access where the address falls outstde the expected 

range for the slot concerned. 

Alternatively, as represented by stage S23, the acuons of the address decoder are dependent upon 

whether the bridge is in the combined or the split mode. 

If the bridge is in the combined mode, then in stage S24 the address decoder controls the target 
controller 140 (see Figure 8) to cause the rounng matrix 80 (see Figure 6) to pass the DMA request to both 

processing sets 14 and 16. 

If the bridge is in the split mode, the address decoder is operahve to verify the ownership of the slot 

concerned by reference to the SRR 1 18 for that slot in stage S25. 

If the slot is allocated to the first processing set 14 (i.e. the SRR[1] bit is set), then in stage S26 the 
address decoder 142 controls the target controller 140 (see Figure 8) to cause the routing matrix 80 (see F.gure 
6) to pass the DMA request to First processing set 14. 

If the slot is allocated to the second processing set 16 (i.e. the SRR[2] bit is set), then in stage S27 the 
address decoder 142 controls the target controller 140 (see Figure 8) to cause the routing matrix 80 (see Figure 
6) to pass the DMA request to the second processing set 16. 

If the slot is unallocated (i.e. neither the SRR[1] bit nor the SRR[2] bit is set), then in step S18 the 
address decoder 142 ignores or discards the DMA request and the DMA request is not passed to the processus 
sets 14 and 16. 

A DMA or direct vector memory access (DVMA), request sent to one or more of the processmg sets 
causes the necessary memory operations (read or write as appropriate) to be effected on the processing set 

memory. 

There now follows a description of an example of a mechanism for enabling automanc recovery from 
an EState (see Figure 1 1). 

The automanc recovery process includes reintegration of the state of the processing sets to a common 
status in order to attempt a restart in lockstep. To achieve this, the processing set which asserts itself as the pnmary 
processing set as described above copies its complete state to the other processing set. Tnis involves ensuring that 
the content of the memory of both processors is the same before trying a restart in lockstep mode. 

However, a problem with the copying of the content of the memory from one processing set to the other .s 
that dunng this copying process a device connected to the D bus 22 might anempt to make a direct memory access 
(DMA) request for access to the memory of the primary processing set. If DMA is enabled, then a write made to 
an area of memory which has already been copied would result in the memory state of the two processors at the end 

u u in nrmcinle it would be possible to inhibit DMA for the whole of the copy 

of the copy not being the same. In principle, it wouia dc 
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process. However, this would be undesirable, bearing in mind that it is desirable to minimise the rime that the 
system or the resources of the system are unavailable. As an alternative, it would be possible to retry the whole 
copy operauon when a DMA operation has occurred during the period of the copy. However, it is likely that 
further DMA operations would be performed during the copy retry, and accordingly thas is not a good option 
either. Accordingly, in the present system, a dirty RAM 124 is provided in the bridge. As described earlier the 
ditty RAM 1 24 is configured as part of the bridge SRAM memory 126. 

The dirty RAM 124 comprises a bit map having a dirty indicator, for example a dirty bit, for each block, 
or pase. of memory. The bit for a page of memory is set when a write access to the area of memory concerned is 
made" In an embodiment of the invention one bit is provided for every 8K page of main processing set memory. 
The bit for a page of processing set memory is set automatically by the address decoder 142 when this decodes a 
DMA request for that page of memory for either of the processing sets 14 or 16 from a device connected to the D 
bus 22. The dirty RAM can be reset, or cleared when it is read by a processing set, for example by means of read 
and clear instructions at the beginning of a copy pass, so that it can start to record pages which are dirtied since a 
given rime. 

The dirty RAM 124 can be read word by word. If a large word size is chosen for reading the dirty RAM 
124 this will optimise the reading and resetting of the dirty RAM 124. 

Accordingly, at the end of the copy pass the bits in the dirty RAM 124 will indicate those pages of 
processing set memory which have been changed (or dirtied) by DMA writes during the period of the copy. A 
further copy pass can then be performed for only those pages of memory which have been dirtied. This will take 
less time that a full copy of the memory. Accordingly, there are typically less pages marked as dirty at the end of 
the next copy pass and, as a result, the copy passes can become shorter and shorter. As some time it is necessary to 
decide to inhibit DMA writes for a short period for a final, short, copy pass, at the end of which the memones of 
the two processing sets will be the same and the primary processing set can issue a reset operation to restart the 
combined mode. 

The dirty RAM 124 is set and cleared in both the combined and split modes. This means that in split 
mode the dirty RAM 124 may be cleared by either processing set. 

The dirty RAM 124 address is decoded from bits 13 to 28 of the PCI address presented by the D bus 
device. Erroneous accesses which present illegal combinations of the address bits 29 to 31 are mapped into the 
dirty RAM 124 and a bit is dirtied on a write, even though the bridge will not pass these transactions to the 
processing sets. 

When reading the dirty RAM 124, the bridge defines the whole area from 0x00008000 to OxOOOOffff 
as dirty RAM and will clear the contents of any location in this range on a read. 

As an alternative to providing a single dirty RAM 124 which is cleared on being read, another 
alternative would be to provide two dirty RAMs which are used in a toggle mode, with one being written to 

while another is read. 

Figure 28 is a flow diagram surnmarising the operation of the dirty RAM 124. 

In stage S41. the primary processing set reads the dirty RAM 124 which has the effect of resetting the 
dirty RAM 124. 
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in stage S42. the pnmary processor (e.g. processing set 14) copies the whole of its memory 56 to the 
memory 56 of the other processing set (e.g. processing set 16). 

In stage S43. the primary processing set reads the dirty RAM 124 which has the effect of resetting the 

dirty RAM 124. 

5 In stage S44. the primary processor determines whether less than a predetermined number of bits have 

been written in the dirty RAM 124. 

If more than the predetermined number of bits have been set, then the processor in stage S45 copies those 
pages of its memory 56 which have been dirtied, as indicated by the dirty bits read from the dirty RAM 124 in 
state S43. to the memory 56 of the other processing set. Control then passes back to stage S43. 
l0 " If. in stage S44, it is determined less than the predetermined number of bits have been written in the dirty 

RAM 124 then in Stage S45 the primary processor causes the bridge to inhibit DMA requests from the devices 
connected to the D bus 22. This could, for example, be achieved by clearing the arbitration enable bit for each of 
the dev.ce slots, thereby denying access of the DMA devices to the D bus 22. Alternatively, the address decoder 
142 could be configured to ignore DMA requests under instructions from the primary processor. During the pertod 
15 in wh,ch DMA accesses are prevented, the primary processor then makes a final copy pass from its memory to the 
memory 56 of the other processor for those memory pages corresponding to the bits set in the dirty RAM 124. 
In stage S47 the primary processor can issue a reset operation for initiating a combined mode. 
In stage S48, DMA accesses are once more permitted. 

It will be appreciated that although particular embodiments of the invention have been described, many 
20 modifications/additions and/or substitutions may be made within the spirit and scope of the present invention as 
defined m the appended claims. For example, although in the specific description two processing sets are provtded, 
it will be appreciated that the specifically described features may be modified to provide for three or more 

processing sets. 
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WHAT IS CLAIMED: 

1 . A bridge for a multi-processor system, the bridge comprising: 

a first processor bus interface for connection to an I/O bus of a first processing set, the first processing 

set including memory; 

a second processor bus interface for connection to an I/O bus of a second processing set, the second 
processing set including memory; 

a device bus interface for connection to a device bus: 

a bridge control mechanism configured to be operable to permit direct memory access to the memory 
of the processing sets by a device on the device bus, to arbitrate between the first and the second processing sets 
for access to the bridge in a first, split, mode, and to monitor lockstep operation of the first and second 
processing sets in a second, combined, mode; and 

a dirty RAM mechanism in the bridge for monitoring regions of processor set memory modified by 

direct memory access by the device on the device bus. 

2 The bridge of claim 1. wherein the dirty RAM mechanism defines a dirty indicator for each of a 
plurality of regions of processing set memory, a dirty indicator being set to a predetermined value when the 
region of memory has been written to by a DMA access. 

3 The bridge of claim 2, wherein the dirty indicator is a dirty bit. 

4 The bridge of claim 2. wherein the processing sets are configured such that one of the processing sets 
is operable in the split mode as a primary processing set and to copy the content of its memory to the other 

processing set. 

5. The bridge of claim 3, wherein the primary processing set is operable at the end of a copy pass to re- 
copy memory regions, which are identified in the dirty RAM mechanism as having been written to by virtue of 
the corresponding dirty indication being set, from its memory to the memory of the other processing set. 

6. The bridge of claim 4, wherein the bridge control mechanism comprises an arbiter configured to be 
operable in the split mode to arbitrate for access to the bridge by the first and second processors and a device on 
the device bus. 

7. The bridge of claim 6, wherein the bridge control mechanism is configured to be operable to respond 
,o a synchronization reset operation from the primary processing set, on completion of copying the content of 
the memory regions identified in the dirty RAM mechanism with no further regions having being so identified, 
to transfer from the split mode of operation to the combined mode of operation. 
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8. The badge of claim 7, wherein the dirty RAM mechanism comprises a dirty RAM configured in 

random access memory in the bridge. 



9. 
set. 



The bridge of claim 6, wherein the content of the dirty RAM is cleared on being read by a processing 



10. The bridge of claim 1, comprising at least one further processor bus interface for connection to an I/O 
bus of a further processing set. 
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A bridge for a multi-processor system, the bridge comprising means for interfacing with a first I/O bus 
for a first processing set, a second I/O bus for a second processing set, and a device bus, means perrmmng direct 
memory access to memory of the processing sets by a device on the device bus, means for arbitrating between 
,he first and the second processing sets for access to the bridge in a first, split, mode, means for monitoring 
lockstep operation of the first and second processing sets in a second, combined, mode and dirty RAM means 
for monitoring regions of processor set memory modified by direct memory accesses by the device on the 
device bus. 

12 A computer system comprising a first processing set having memory and an first I/O bus, a second 
processing set having memory and a second I/O bus, a device bus, at least one device on the device bus and a 
bridge, the bridge being connected to the first I/O bus the second I/O bus and the device bus and composing: 

a bridge control mechanism configured to be operable to permit direct memory access to the memory 
of the processing sets by the at least one device on the device bus, to arbitrate between the first and the second 
processing sets for access to the bridge in a first, split, mode, and to monitor lockstep operation of the first and 
second processing sets in a second, combined, mode; and 

a dirty RAM mechanism in the bridge for monitoring regions of processor set memory modified by 
direct memory accesses by the device on the device bus. 

,3. A computer system according to claim 12, wherein each processing set comprises at least one 
processor, memory and a processing set I/O bus controller. 

14. The computer system of claim 12, further comprising at least one further processing set. 

1 5 A method of operating a multi-processor system comprising a first processing set having memory and 
a first I/O bus, a second processing set having memory and a second I/O bus, a device bus having at least one 
device connected thereto, and a bridge, the bridge being connected to the first I/O bus, the second I/O bus and 
the device bus, the method comprising: 

permitting direct memory access to the memory of the processing sets by the at least one device on the 

device bus; and 
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monitoring, in a dirty RAM in the bridge, regions of processor set memory modified by direct memory 
access by the device on the device bus. 

16 A method of re- integrating a fault tolerant multi-processor system comprising a first processus set 
5 havme memory and an I/O bus, a second processing set having memory and an I/O bus, a device bus having at 
least one device connected thereto, and a bridge, the bridge being connected to the first I/O bus, the second I/O 
bus and the device bus, the method comprising: 

following a lockstep error, operating the system in a split mode in which one of the processing sets is 
operable to copy its state to the other processing set, during which split mode direct memory access to memory 
10 of the processing set, by the at least one device on the dev.ce bus is permitted and reg.ons of processor se, 
memory written to by the device are marked in a dirty RAM in the bridge; 

conducting a number of rimes a step of copying areas of memory indicated in the dirty RAM as having 
been dirtied since the start of a previous copy step. 

! 5 17 The method of claim 16, wherein, direct memory access i. inhibited during a final copy step and then a 
combined mode is initiated, in which combined mode lockstep operation of the first and second processing sets 
is monitored. 
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